Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support OFED with DTK #711

Merged
merged 1 commit into from
Jan 8, 2024
Merged

feat: Support OFED with DTK #711

merged 1 commit into from
Jan 8, 2024

Conversation

rollandf
Copy link
Member

@rollandf rollandf commented Dec 19, 2023

In Openshift, in order to OFED containter to be able to download and compile the needed Kernel files, it is required to install a cluster-wide entitlement.

This requirement is not user friendly.

In order to avoid this, a container image with the needed files is available in Openshift distributions.
This image is called DriverToolKit aka DTK.

By using this container as a side-car to MOFED container, the modules can be compiled without entitlement.

Changes:
API:

  • DTK is 'true' by default, and can be changed by env
    variable in the Operator Deployment

OFED state:

  • In case of OCP and 'useOcpDriverToolkit' is true, find DTK image based on NFD label of node.
  • If available, add to MOFED DS a DTK container, change entrypoint logic.

@rollandf rollandf added the on hold This enhancement is currently on hold pending additional clarification and evaluation label Dec 19, 2023
@rollandf
Copy link
Member Author

Note that the API with MOFED container is not finalized yet.

@rollandf rollandf force-pushed the dtk-new branch 2 times, most recently from 2c6b188 to 52ed808 Compare December 20, 2023 10:05
api/v1alpha1/nicclusterpolicy_types.go Outdated Show resolved Hide resolved
manifests/state-ofed-driver/0050_ofed-driver-ds.yaml Outdated Show resolved Hide resolved
}

func (d *openShiftClusterProvider) GetClusterType() clustertype.Type {
return clustertype.Kubernetes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: switch to clusterType.Openshift to be consistent with what the methods below return.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pkg/state/state_ofed.go Outdated Show resolved Hide resolved
manifests/state-ofed-driver/0050_ofed-driver-ds.yaml Outdated Show resolved Hide resolved
@rollandf rollandf force-pushed the dtk-new branch 4 times, most recently from 864d53f to a274a04 Compare December 25, 2023 06:48
@rollandf
Copy link
Member Author

/retest-blackduck_scan

@@ -447,6 +447,7 @@ containerResources:
| `ofedDriver.upgradePolicy.waitForCompletion.podSelector` | string | not set | specifies a label selector for the pods to wait for completion before starting the driver upgrade |
| `ofedDriver.upgradePolicy.waitForCompletion.timeoutSeconds` | int | not set | specify the length of time in seconds to wait before giving up for workload to finish, zero means infinite |
| `ofedDriver.containerResources` | [] | not set | Optional [resource requests and limits](#container-resources) for the `mofed-container` container |
| `ofedDriver.useOcpDriverToolkit` | bool | `true` | In OpenShift, use Driver Toolkit image to compile OFED drivers |
Copy link
Collaborator

@adrianchiris adrianchiris Dec 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prevent setting to this to true in validation webhook if we are not in openshift ? WDYT ?

i.e if not openshift and this is true -> fail CR create/update

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i now remember we discussed the flow :) so maybe we should just ignore this on non openshift

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternative: dont introduce any API changes for this in CRD, add ENV var in operator (DISABLE_DTK or similar)

since for openshift we do want dtk to always run. in special cases where we dont, we can deploy operator with this env var.

this is best option imo as going forward we either want dtk or precompiled

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed from Helm

@@ -226,6 +226,7 @@ ofedDriver:
# podSelector: "app=myapp"
# specify the length of time in seconds to wait before giving up for workload to finish, zero means infinite
# timeoutSeconds: 300
useOcpDriverToolkit: true
Copy link
Collaborator

@adrianchiris adrianchiris Dec 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really want to have this defined in values ?
(also i dont see we use it in the nicclusterpolicy CR template)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed from helm

@@ -735,3 +753,27 @@ func (s *stateOFED) handleRepoConfig(
}
return nil
}

// getOCPDriverToolkitImage gets the DTK ImageStream and return the DTK image according to RHCOS version
func (s *stateOFED) getOCPDriverToolkitImage(ctx context.Context, rhcosVersion string) (string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rhcosVersion -> ostreeVersion

this exists also for rhel (OSTREE_VERSION) i assume.
and we should support dtk on rhel as well right ?

OSTREE_VERSION="412.86.202301311551-0"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

{{- end }}
{{- end }}
volumeMounts:
{{- if.AdditionalVolumeMounts.VolumeMounts }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need additional mounts for DTK ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@rollandf
Copy link
Member Author

rollandf commented Jan 3, 2024

/retest-nic_operator_kind

@rollandf rollandf added on hold This enhancement is currently on hold pending additional clarification and evaluation and removed on hold This enhancement is currently on hold pending additional clarification and evaluation labels Jan 4, 2024
In Openshift, in order to OFED containter to be able
to download and compile the needed Kernel files, it is
required to install a cluster-wide entitlement.

This requirement is not user friendly.

In order to avoid this, a container image with the needed files
is available in Openshift distributions.
This image is called DriverToolKit aka DTK.

By using this container as a side-car to MOFED container,
the modules can be compiled without entitlement.

Changes:
API:
 - DTK is 'true' by default, and can be changed by env
   variable in the Operator Deployment

OFED state:
 - In case of OCP and 'useOcpDriverToolkit' is true,
   find DTK image based on NFD label of node.
 - If available, add to MOFED DS a DTK container,
   change entrypoint logic.

Signed-off-by: Fred Rolland <[email protected]>
Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rollandf
Copy link
Member Author

rollandf commented Jan 7, 2024

/retest-nic_operator_kind

@rollandf rollandf removed the on hold This enhancement is currently on hold pending additional clarification and evaluation label Jan 8, 2024
@rollandf
Copy link
Member Author

rollandf commented Jan 8, 2024

@e0ne @adrianchiris Can we have this in the beta?

@e0ne e0ne merged commit 9f03156 into Mellanox:master Jan 8, 2024
16 checks passed
@rollandf rollandf deleted the dtk-new branch February 28, 2024 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants